Post-treatment breast cancer prognosis

ABSTRACT

The disclosure includes the identification and use of gene expression profiles, or patterns, with clinical relevance to extended treatment and cancer-free survival in a patient. In particular, the disclosure includes the identities of genes that are expressed in correlation with benefit in a switch in endocrine therapy used to treat a patient. The levels of gene expression are disclosed as a molecular index for predicting clinical outcome, and so prognosis, for the patient. The disclosure further includes methods for predicting cancer recurrence, and/or predicting occurrence of metastatic cancer, after initial treatment with an anti-estrogen agent. The disclosure further includes methods for determining or selecting the treatment of a subject based upon the likelihood of life expectancy, cancer recurrence, and/or cancer metastasis.

RELATED APPLICATIONS

This application is a continuation of PCT Application PCT/US2011/064290,filed Dec. 9, 2011 and published as WO 2012/079059 with designation ofthe U.S., and which claims benefit of priority to U.S. ProvisionalPatent Application 61/421,627, filed Dec. 9, 2010, both of which ishereby incorporated by reference in their entireties as if fully setforth herein.

This application is related to International Application No.PCT/US2008/075528, filed on Sep. 6, 2008 (published as WO 2009/108215A1) with designation of the U.S., and to U.S. patent application Ser.No. 12/718,973, filed Mar. 6, 2010. Both applications are herebyincorporated by reference as if fully set forth herein.

FIELD OF THE DISCLOSURE

The disclosure relates to the identification and use of gene expressionprofiles, or patterns, with clinical relevance to breast cancer. Inparticular, the disclosure is based in part on the identities of genesthat are expressed in correlation with the likelihood of cancerrecurrence after initial treatment with an aromatase inhibitor or otherendocrine therapy. The levels of gene expression form a molecular indexthat is able to predict clinical outcome, and so prognosis, for apatient after initial treatment with an aromatase inhibitor or otherendocrine therapy.

The gene expression profiles, whether embodied in nucleic acidexpression, protein expression, or other expression formats, may be usedto predict the post-treatment clinical outcome of subjects afflictedwith breast cancer, predict cancer recurrence, and/or predict occurrenceof metastatic cancer. The profiles may also be used in the study of asubject's prognosis. When used for prognosis, the profiles are used todetermine the treatment of cancer based upon the likelihood of lifeexpectancy, cancer recurrence, and/or cancer metastasis.

BACKGROUND OF THE DISCLOSURE

The treatment of breast cancer has been a field of intense interest andstudy. After initial diagnosis of breast cancer by analysis of a sampleof breast cancer cells from a subject, treatment methods often beginwith surgical removal of the tumor cells. In cases of hormone-dependentbreast cancer, such as estrogen receptor positive (ER+) breast cancer,the surgery is followed by antagonizing estrogen to reduce tumor growthor re-growth. In many cases, treatment with the anti-estrogen tamoxifenis used for five years to reduce the risk of disease recurrence and sobreast cancer mediated mortality.

Unfortunately, data from the field indicate that more than half of allbreast cancer recurrences occur after five years of treatment withadjuvant tamoxifen.

Goss et al. (J. Clin. Oncol., 26(12):1948-1955, 2008) report resultsfrom a trial examining the use of letrozole started within 3 monthsafter five years of adjuvant tamoxifen in subjects with primary ER+breast cancer. The results suggested that post-tamoxifen treatment withletrozole improves breast cancer-free survival and distant breastcancer-free survival.

But Goss et al. provided no means by which to predict which subjects,treated for five years with tamoxifen, would benefit from subsequentletrozole treatment. Therefore, there was no means to direct letrozoletreatment only to the subjects for whom a benefit is expected. Soletrozole treatment was applied to subjects for whom no benefit wouldhave been expected, resulting in an overtreatment of the population ofbreast cancer-free subjects treated with for five years with tamoxifen.

The citation of documents herein is not to be construed as reflecting anadmission that any is relevant prior art. Moreover, their citation isnot an indication of a search for relevant disclosures. All statementsregarding the dates or contents of the documents is based on availableinformation and is not an admission as to their accuracy or correctness.

BRIEF SUMMARY OF THE DISCLOSURE

The disclosure is based in part on the discovery and determination ofgene expression levels in breast cancer tumor cells that are correlatedwith a beneficial switch in anti-breast cancer chemotherapy. In somecases, the switch is from one form of endocrine therapy to another. Theexpression levels may be used to provide prognostic information, such ascancer recurrence, and predictive information, such as responsiveness tocertain therapies.

In a first aspect, the disclosure includes a method to identify, orclassify, a population of subjects initially treated with ananti-estrogen or anti-aromatase therapy into at least twosubpopulations. A first subpopulation would be expected to benefit froma switch in therapy, such as a switch to another anti-estrogen oranti-aromatase therapy. A second subpopulation would not be expected tobenefit. In some cases, the initial therapy is with tamoxifen, such asadjuvant tamoxifen therapy for a period of about five years or less.Optionally, the switch is to letrozole, or other anti-aromatase,therapy. The disclosure includes means for a population of subjectstreated in this manner, and breast cancer-free during treatment, to beclassified into the first, and/or the second, subpopulations.

The methods of the disclosure are based on the expression levels ofcertain genes, including the expression level of HoxB13, in breastcancer cells of a subject. In some embodiments, a two-gene ratio ofHoxB13 expression to IL17BR expression (or HoxB13:IL17BR ratio) may beused (see Ma et al., J. Clin. Oncol., 24:4611-9 (2006). In alternativeembodiments, a two-gene ratio of HoxB13 expression to CHDH expressionmay be used.

The HoxB13:IL17BR (H:I) ratio was discovered based upon a study of novelbiomarkers predictive of clinical outcome beyond standard prognosticfactors. Patients who developed cancer recurrences were matched to thosewho did not with respect to tumor stage and grade. The simple H:I ratiowas found to be suitable for predicting cancer recurrence in patientswith estrogen receptor-positive (ER+) breast cancer receiving adjuvanttamoxifen therapy. Subsequent studies (Goetz et al., Clin Cancer Res.12:2080-7 (2006); Jerevall et al., Breast Cancer Res. Treat (2007);Jansen et al., J. Clin. Oncol. 25:662-8 (2007)) have further shown thatthe ratio is both prognostic, such as by being an indicator of tumoraggressiveness, and predictive of tamoxifen benefit within bothretrospective and randomized clinical trials.

In further embodiments, the disclosure includes one or more additionalgenes in combination with HoxB13 expression. The combination may be withany one, two, three, four or all five of the additionally disclosedgenes as follows.

The additional genes of the disclosure encode Bub1B (“buddinguninhibited by benzimidazoles 1 beta) or p21 protein-activated kinase 6(PAK6); CENPA (centromere protein A, isoform a); NEK2 (NIMA-relatedkinase 2 or “never in mitosis gene a”-related kinase 2); RACGAP1 (RacGTPase activating protein 1); and RRM2 (ribonucleotide reductase M2).The use of these five genes alone is referred to herein as the MolecularGrade Index (MGI). Aspects of the disclosure include compositions andmethods are described for the use of HoxB13 expression, with or withoutIL17BR expression, in combination with expression level(s) of one ormore of the above five genes to study, to provide prognosticinformation, and/or provide predictions of clinical responsiveness.

Thus the disclosure is based in part on the discovery that geneexpression level(s) are useful for providing prognostic determinations(such as the likelihood of cancer recurrence in the form of breastcancer recurrence either locally or distally or in the form ofmetastasis) and predictive determinations (such as responsiveness to acourse of treatment) for a subject. The use of all seven disclosed genesis referred to as the Breast Cancer Index (BCI).

When the expression levels of the BCI were analyzed using real-timereverse transcription-polymerase chain reaction (RT-PCR), thecombination was found to provide superior stratification of risk ofrecurrence in subjects treated with five years of tamoxifen therapy.This reflects an unexpected discovery because it identifies for thefirst time a predictor for beneficial switching of breast cancertherapies.

In additional aspects, HoxB13 expression, and/or the BCI, may be used topredict late recurrence of cancer in a breast cancer patient.Non-limiting examples of late recurrence include after 5 years oftreatment with tamoxifen, but also includes after 4 years, after 3years, or after 2 years or less time of treatment with tamoxifen.Similarly, HoxB13 expression, and/or the BCI, may be used to predictresponsiveness to letrozole or other anti-estrogen or anti-aromatasetherapy after the above time periods to inhibit late recurrence.

Embodiments of the disclosure include an assay method with prognosticvalue and predictive value for stratifying subjects with original ER+breast cancer and subsequent breast cancer-free treatment. As aprognostic, the stratification may be based on differential expressionlevels that correlate with, and so indicate, need for a switch in breastcancer therapies as a non-limiting example. As a non-limiting example,the stratification (based on expression levels) may be used to predictendocrine sensitivity (such as sensitivity to letrozole as anon-limiting example) and/or prediction of benefit from anti-estrogenand/or anti-aromatase inhibitors. The detection of gene expression mayof course be in any suitable cell containing sample as described herein.Non-limiting examples of cells for use in the disclosure include thosefreshly isolated from the subject, those frozen after isolation, andthose that are fixed and/or embedded, such as formalin fixed, paraffinembedded (FFPE). In most embodiments, the cells are breast cells, suchas breast cancer cells.

In some embodiments, a method based on the expression levels isadvantageously used on a breast cancer cell containing sample from asubject, such as a DCIS sample. As a non-limiting example, the cell maybe one from a pre-operative histological sample used to diagnose cancerin the subject. For such a subject, the standard of care is surgery,with breast conserving surgery preferred over a radical mastectomy, toremove the DCIS. This is often followed by post-operative radiotherapy,optionally with endocrine therapy, such as treatment with tamoxifen, aselective estrogen receptor modulator (SERM), a selective estrogenreceptor down-regulator (SERD), or an aromatase inhibitor (AI) such asletrozole. In other post-operative cases, endocrine therapy isadministered without radiation, and optionally with chemotherapy.

The instant disclosure is directed to the identification of a subject asexpected to benefit from a switch in endocrine therapy, such as from onetype of endocrine therapy to another, after breast cancer-free survivalduring the course of the initial endocrine therapy. In additionalembodiments, the switch may be made after an initial course of endocrinetherapy for 5 years, 4 years, 3 years, or 2 years.

The disclosure also includes detecting gene expression where high HoxB13expression is an indicator of increased likelihood of cancer recurrencein the subject following an initial endocrine therapy, such as adjuvanttamoxifen therapy. The methods may thus include identifying the subjectas likely, or unlikely, to experience local cancer recurrence, andfurther include switching treatment modalities for the subject toaddress the expected outcome. As a non-limiting example, determinationof a likelihood of recurrence in the absence of an extended,post-initial treatment, therapy may be used to confirm the suitabilityof, or to select, an extended therapy with a switch in the anti-estrogenand/or anti-aromatase modality used.

In some cases, the disclosed methods may be used to select or eliminatetherapies for premenopausal women, or for postmenopausal women, thathave undergone treatment with endocrine therapy and remained cancer-freeduring that time. Premenopausal women include those who are less thanabout 35 years of age. The method may include assaying a breast cancercell containing sample from a subject for expression of the disclosedgenes. As a non-limiting example, the cell may be one from apre-operative histological sample used to diagnose cancer in thesubject.

Non-limiting examples of endocrine therapy include treatment with anSERM, such as tamoxifen, or an SERD, or an aromatase inhibitor (AI).Non-limiting examples of an AI include non-steroidal inhibitors such asletrozole and anastrozole and irreversible steroidal inhibitors such asexemestane.

DETAILED DESCRIPTION OF MODES OF PRACTICING THE DISCLOSURE Definitionsof Terms as Used Herein

A gene expression “pattern” or “profile” or “signature” refers to therelative expression of one or more genes between two or more clinicaloutcomes, cancer outcomes, cancer recurrence and/or survival outcomeswhich is correlated with being able to distinguish between saidoutcomes. In some cases, the outcome is that of breast cancer.

A “gene” is a polynucleotide that encodes a discrete product, whetherRNA or proteinaceous in nature. It is appreciated that more than onepolynucleotide may be capable of encoding a discrete product. The termincludes alleles and polymorphisms of a gene that encodes the sameproduct, or a functionally associated (including gain, loss, ormodulation of function) analog thereof, based upon chromosomal locationand ability to recombine during normal mitosis.

The terms “correlate” or “correlation” or equivalents thereof refer toan association between expression of one or more genes and a physiologicstate of a cell to the exclusion of one or more other state asidentified by use of the methods as described herein. A gene may beexpressed at a higher or a lower level and still be correlated with oneor more cancer state or outcome.

A “polynucleotide” is a polymeric form of nucleotides of any length,either ribonucleotides or deoxyribonucleotides. This term refers only tothe primary structure of the molecule. Thus, this term includes double-and single-stranded DNA and RNA. It also includes known types ofmodifications including labels known in the art, methylation, “caps”,substitution of one or more of the naturally occurring nucleotides withan analog, and internucleotide modifications such as uncharged linkages(e.g., phosphorothioates, phosphorodithioates, etc.), as well asunmodified forms of the polynucleotide.

The term “amplify” is used in the broad sense to mean creating anamplification product can be made enzymatically with DNA or RNApolymerases. “Amplification,” as used herein, generally refers to theprocess of producing multiple copies of a desired sequence, particularlythose of a sample. “Multiple copies” mean at least 2 copies. A “copy”does not necessarily mean perfect sequence complementarity or identityto the template sequence.

By corresponding is meant that a nucleic acid molecule shares asubstantial amount of sequence identity with another nucleic acidmolecule. Substantial amount means at least 95%, usually at least 98%and more usually at least 99%, and sequence identity is determined usingthe BLAST algorithm, as described in Altschul et al., J. Mol. Biol.215:403-410 (1990) (using the published default setting, i.e. parametersw=4, t=17). Methods for amplifying mRNA are generally known in the art,and include reverse transcription PCR (RT-PCR) and those described inU.S. patent application Ser. No. 10/062,857 (filed on Oct. 25, 2001), aswell as U.S. Provisional Patent Applications 60/298,847 (filed Jun. 15,2001) and 60/257,801 (filed Dec. 22, 2000), all of which are herebyincorporated by reference in their entireties as if fully set forth.Another method which may be used is quantitative PCR (or Q-PCR).Alternatively, RNA may be directly labeled as the corresponding cDNA bymethods known in the art.

A “microarray” is a linear or two-dimensional array of preferablydiscrete regions, each having a defined area, formed on the surface of asolid support such as, but not limited to, glass, plastic, or syntheticmembrane. The density of the discrete regions on a microarray isdetermined by the total numbers of immobilized polynucleotides to bedetected on the surface of a single solid phase support, preferably atleast about 50/cm², more preferably at least about 100/cm², even morepreferably at least about 500/cm², but preferably below about 1,000/cm².Preferably, the arrays contain less than about 500, about 1000, about1500, about 2000, about 2500, or about 3000 immobilized polynucleotidesin total. As used herein, a DNA microarray is an array ofoligonucleotides or polynucleotides placed on a chip or other surfacesused to hybridize to amplified or cloned polynucleotides from a sample.Since the position of each particular group of primers in the array isknown, the identities of a sample polynucleotides can be determinedbased on their binding to a particular position in the microarray.

Because the disclosure relies upon the identification of genes that areover- or under-expressed, one embodiment of the disclosure involvesdetermining expression by hybridization of mRNA, or an amplified orcloned version thereof, of a sample cell to a polynucleotide that isunique to a particular gene sequence. Preferred polynucleotides of thistype contain at least about 20, at least about 22, at least about 24, atleast about 26, at least about 28, at least about 30, or at least about32 consecutive basepairs of a gene sequence that is not found in othergene sequences. The term “about” as used in the previous sentence refersto an increase or decrease of 1 from the stated numerical value. Evenmore preferred are polynucleotides of at least or about 50, at least orabout 100, at least about or 150, at least or about 200, at least orabout 250, at least or about 300, at least or about 350, or at least orabout 400 basepairs of a gene sequence that is not found in other genesequences. The term “about” as used in the preceding sentence refers toan increase or decrease of 10% from the stated numerical value. Suchpolynucleotides may also be referred to as polynucleotide probes thatare capable of hybridizing to sequences of the genes, or unique portionsthereof, described herein. Preferably, the sequences are those of mRNAencoded by the genes, the corresponding cDNA to such mRNAs, and/oramplified versions of such sequences. In preferred embodiments of thedisclosure, the polynucleotide probes are immobilized on an array, otherdevices, or in individual spots that localize the probes.

In another embodiment of the disclosure, all or part of a disclosedsequence may be amplified and detected by methods such as the polymerasechain reaction (PCR) and variations thereof, such as, but not limitedto, quantitative PCR (Q-PCR), reverse transcription PCR (RT-PCR), andreal-time PCR, optionally real-time RT-PCR. Such methods would utilizeone or two primers that are complementary to portions of a disclosedsequence, where the primers are used to prime nucleic acid synthesis.The newly synthesized nucleic acids are optionally labeled and may bedetected directly or by hybridization to a polynucleotide of thedisclosure. The newly synthesized nucleic acids may be contacted withpolynucleotides (containing sequences) of the disclosure underconditions which allow for their hybridization.

Alternatively, and in another embodiment of the disclosure, geneexpression may be determined by analysis of expressed protein in a cellsample of interest by use of one or more antibodies specific for one ormore epitopes of individual gene products (proteins) in said cellsample. Such antibodies are preferably labeled to permit their easydetection after binding to the gene product.

The term “label” refers to a composition capable of producing adetectable signal indicative of the presence of the labeled molecule.Suitable labels include radioisotopes, nucleotide chromophores, enzymes,substrates, fluorescent molecules, chemiluminescent moieties, magneticparticles, bioluminescent moieties, and the like. As such, a label isany composition detectable by spectroscopic, photochemical, biochemical,immunochemical, electrical, optical or chemical means.

The term “support” refers to conventional supports such as beads,particles, dipsticks, fibers, filters, membranes and silane or silicatesupports such as glass slides.

As used herein, a “cancer tissue sample” or “cancer cell sample” refersto a cell containing sample of tissue isolated from an individualafflicted with the corresponding cancer. The sample may be from materialremoved via a surgical procedure, such as a biopsy. Such samples areprimary isolates (in contrast to cultured cells) and may be collected byany suitable means recognized in the art. In some embodiments, the“sample” may be collected by an non-invasive method, including, but notlimited to, abrasion, fine needle aspiration.

A “breast tissue sample” or “breast cell sample” refers to a sample ofbreast tissue or fluid isolated from an individual suspected of beingafflicted with, or at risk of developing, breast cancer. Such samplesare primary isolates (in contrast to cultured cells) and may becollected by any non-invasive means, including, but not limited to,ductal lavage, fine needle aspiration, needle biopsy, the devices andmethods described in U.S. Pat. No. 6,328,709, or any other suitablemeans recognized in the art. Alternatively, the “sample” may becollected by an invasive method, including, but not limited to, surgicalbiopsy.

“Expression” and “gene expression” include transcription and/ortranslation of nucleic acid material. Of course the term may also belimited, if so indicated, as referring only to the transcription ofnucleic acids.

As used herein, the term “comprising” and its cognates are used in theirinclusive sense; that is, equivalent to the term “including” and itscorresponding cognates.

Conditions that “allow” an event to occur or conditions that are“suitable” for an event to occur, such as hybridization, strandextension, and the like, or “suitable” conditions are conditions that donot prevent such events from occurring. Thus, these conditions permit,enhance, facilitate, and/or are conducive to the event. Such conditions,known in the art and described herein, depend upon, for example, thenature of the nucleotide sequence, temperature, and buffer conditions.These conditions also depend on what event is desired, such ashybridization, cleavage, strand extension or transcription.

Sequence “mutation,” as used herein, refers to any sequence alterationin the sequence of a gene disclosed herein interest in comparison to areference sequence. A sequence mutation includes single nucleotidechanges, or alterations of more than one nucleotide in a sequence, dueto mechanisms such as substitution, deletion or insertion. Singlenucleotide polymorphism (SNP) is also a sequence mutation as usedherein. Because the present disclosure is based on the relative level ofgene expression, mutations in non-coding regions of genes as disclosedherein may also be assayed in the practice of the disclosure.

“Detection” includes any means of detecting, including direct andindirect detection of gene expression and changes therein. For example,“detectably less” products may be observed directly or indirectly, andthe term indicates any reduction (including the absence of detectablesignal). Similarly, “detectably more” product means any increase,whether observed directly or indirectly.

Increases and decreases in expression of the disclosed sequences aredefined in the following terms based upon percent or fold changes overexpression in normal cells. Increases may be of 10, 20, 30, 40, 50, 60,70, 80, 90, 100, 120, 140, 160, 180, or 200% relative to expressionlevels in normal cells. Alternatively, fold increases may be of 1, 1.5,2, 2.5, 3, 3.5, 4, 4.5, 5, 5.5, 6, 6.5, 7, 7.5, 8, 8.5, 9, 9.5, or foldover expression levels in normal cells. Decreases may be of 10, 20, 30,40, 50, 55, 60, 65, 70, 75, 80, 82, 84, 86, 88, 90, 92, 94, 96, 98, 99or 100% relative to expression levels in normal cells.

Unless defined otherwise all technical and scientific terms used hereinhave the same meaning as commonly understood to one of ordinary skill inthe art to which this disclosure belongs.

General

The gene expression patterns disclosed herein are predictive factors fortherapeutic benefit in a switch in endocrine therapy. In some cases, theprediction is in node-negative breast cancer patients, such as ER+node-negative patients as a non-limiting example.

To determine the expression levels of genes in the practice of thepresent disclosure, any method known in the art may be utilized. In someembodiments, expression based on detection of RNA which hybridizes tothe genes identified and disclosed herein is used. This is readilyperformed by any RNA detection or amplification+detection method knownor recognized as equivalent in the art such as, but not limited to,reverse transcription-PCR, the methods disclosed in U.S. patentapplication Ser. No. 10/062,857 (filed on Oct. 25, 2001) as well as U.S.Provisional Patent Application 60/298,847 (filed Jun. 15, 2001) and60/257,801 (filed Dec. 22, 2000), and methods to detect the presence, orabsence, of RNA stabilizing or destabilizing sequences.

Alternatively, expression based on detection of DNA status may be used.Detection of the DNA of an identified gene as methylated or deleted maybe used for genes that have decreased expression. This may be readilyperformed by PCR based methods known in the art, including, but notlimited to, Q-PCR. Conversely, detection of the DNA of an identifiedgene as amplified may be used for genes that have increased expressionin correlation with a particular breast cancer outcome. This may bereadily performed by PCR based, fluorescent in situ hybridization (FISH)and chromosome in situ hybridization (CISH) methods known in the art.

Expression based on detection of a presence, increase, or decrease inprotein levels or activity may also be used. Detection may be performedby any immunohistochemistry (IHC) based, blood based (especially forsecreted proteins), antibody (including autoantibodies against theprotein) based, exfoliate cell (from the cancer) based, massspectroscopy based, and image (including used of labeled ligand) basedmethod known in the art and recognized as appropriate for the detectionof the protein. Antibody and image based methods are additionally usefulfor the localization of tumors after determination of cancer by use ofcells obtained by a non-invasive procedure (such as ductal lavage orfine needle aspiration), where the source of the cancerous cells is notknown. A labeled antibody or ligand may be used to localize thecarcinoma(s) within a patient.

One embodiment using a nucleic acid based assay to determine expressionis by immobilization of one or more sequences of the genes identifiedherein on a solid support, including, but not limited to, a solidsubstrate as an array or to beads or bead based technology as known inthe art. Alternatively, solution based expression assays known in theart may also be used.

The immobilized gene(s) may be in the form of polynucleotides that areunique or otherwise specific to the gene(s) such that the polynucleotidewould be capable of hybridizing to a DNA or RNA corresponding to thegene(s). These polynucleotides may be the full length of the gene(s) orbe short sequences of the genes (up to one nucleotide shorter than thefull length sequence known in the art by deletion from the 5′ or 3′ endof the sequence) that are optionally minimally interrupted (such as bymismatches or inserted non-complementary basepairs) such thathybridization with a DNA or RNA corresponding to the gene(s) is notaffected. In some cases, the polynucleotides used are from the 3′ end ofthe gene, such as within about 350, about 300, about 250, about 200,about 150, about 100, or about 50 nucleotides from the polyadenylationsignal or polyadenylation site of a gene or expressed sequence.Polynucleotides containing mutations relative to the sequences of thedisclosed genes may also be used so long as the presence of themutations still allows hybridization to produce a detectable signal.

The immobilized gene(s) may be used to determine the state of nucleicacid samples prepared from sample cancer, or breast, cell(s) for whichthe outcome of the sample's subject (e.g. patient from whom the sampleis obtained) is not known or for confirmation of an outcome that isalready assigned to the sample's subject. Without limiting thedisclosure, such a cell may be from a patient with ER+ breast cancer.The immobilized polynucleotide(s) need only be sufficient tospecifically hybridize to the corresponding nucleic acid moleculesderived from the sample under suitable conditions.

As will be appreciated by those skilled in the art, some of thecorresponding sequences noted above include 3′ poly A (or poly T on thecomplementary strand) stretches that do not contribute to the uniquenessof the disclosed sequences. The disclosure may thus be practiced withsequences lacking the 3′ poly A (or poly T) stretches. The uniqueness ofthe disclosed sequences refers to the portions or entireties of thesequences which are found only in the disclosed gene's nucleic acids,including unique sequences found at the 3′ untranslated portion of thegenes. Preferred unique sequences for the practice of the disclosure arethose which contribute to the consensus sequences for each of the threesets such that the unique sequences will be useful in detectingexpression in a variety of individuals rather than being specific for apolymorphism present in some individuals. Alternatively, sequencesunique to an individual or a subpopulation may be used. The preferredunique sequences are preferably of the lengths of polynucleotides of thedisclosure as discussed herein.

To determine the (increased or decreased) expression levels of the abovedescribed sequences in the practice of the disclosure, any method knownin the art may be utilized. In one embodiment of the disclosure,expression based on detection of RNA which hybridizes to polynucleotidescontaining the above described sequences is used. This is readilyperformed by any RNA detection or amplification+detection method knownor recognized as equivalent in the art such as, but not limited to,reverse transcription-PCR (optionally real-time PCR), the methodsdisclosed in U.S. patent application Ser. No. 10/062,857 entitled“Nucleic Acid Amplification” filed on Oct. 25, 2001 as well as U.S.Provisional Patent Application 60/298,847 (filed Jun. 15, 2001) and60/257,801 (filed Dec. 22, 2000), the methods disclosed in U.S. Pat. No.6,291,170, and quantitative PCR. Methods to identify increased RNAstability (resulting in an observation of increased expression) ordecreased RNA stability (resulting in an observation of decreasedexpression) may also be used. These methods include the detection ofsequences that increase or decrease the stability of mRNAs containingthe genes' sequences. These methods also include the detection ofincreased mRNA degradation.

In some embodiments of the disclosure, polynucleotides having sequencespresent in the 3′ untranslated and/or non-coding regions of the abovedisclosed sequences are used to detect expression levels of the genesequences in cancer, or breast, cells. Such polynucleotides mayoptionally contain sequences found in the 3′ portions of the codingregions of the above disclosed sequences. Polynucleotides containing acombination of sequences from the coding and 3′ non-coding regionspreferably have the sequences arranged contiguously, with no interveningheterologous sequences.

Alternatively, the disclosure may be practiced with polynucleotideshaving sequences present in the 5′ untranslated and/or non-codingregions of the gene sequences in cancer, or breast, cells to detecttheir levels of expression. Such polynucleotides may optionally containsequences found in the 5′ portions of the coding regions.Polynucleotides containing a combination of sequences from the codingand 5′ non-coding regions preferably have the sequences arrangedcontiguously, with no intervening heterologous sequences. The disclosuremay also be practiced with sequences present in the coding regions ofthe disclosed gene sequences.

Non-limiting polynucleotides contain sequences from 3′ or 5′untranslated and/or non-coding regions of at least about 20, at leastabout 22, at least about 24, at least about 26, at least about 28, atleast about 30, at least about 32, at least about 34, at least about 36,at least about 38, at least about 40, at least about 42, at least about44, or at least about 46 consecutive nucleotides. The term “about” asused in the previous sentence refers to an increase or decrease of 1from the stated numerical value. Even more preferred are polynucleotidescontaining sequences of at least or about 50, at least or about 100, atleast about or 150, at least or about 200, at least or about 250, atleast or about 300, at least or about 350, or at least or about 400consecutive nucleotides. The term “about” as used in the precedingsentence refers to an increase or decrease of 10% from the statednumerical value.

Sequences from the 3′ or 5′ end of the above described coding regions asfound in polynucleotides of the disclosure are of the same lengths asthose described above, except that they would naturally be limited bythe length of the coding region. The 3′ end of a coding region mayinclude sequences up to the 3′ half of the coding region. Conversely,the 5′ end of a coding region may include sequences up the 5′ half ofthe coding region. Of course the above described sequences, or thecoding regions and polynucleotides containing portions thereof, may beused in their entireties.

Polynucleotides combining the sequences from a 3′ untranslated and/ornon-coding region and the associated 3′ end of the coding region may beat least or about 100, at least about or 150, at least or about 200, atleast or about 250, at least or about 300, at least or about 350, or atleast or about 400 consecutive nucleotides. Preferably, thepolynucleotides used are from the 3′ end of the gene, such as withinabout 350, about 300, about 250, about 200, about 150, about 100, orabout 50 nucleotides from the polyadenylation signal or polyadenylationsite of a gene or expressed sequence. Polynucleotides containingmutations relative to the sequences of the disclosed genes may also beused so long as the presence of the mutations still allows hybridizationto produce a detectable signal.

In another embodiment of the disclosure, polynucleotides containingdeletions of nucleotides from the 5′ and/or 3′ end of the abovedisclosed sequences may be used. The deletions are preferably of 1-5,5-10, 10-15, 15-20, 20-25, 25-30, 30-35, 35-40, 40-45, 45-50, 50-60,60-70, 70-80, 80-90, 90-100, 100-125, 125-150, 150-175, or 175-200nucleotides from the 5′ and/or 3′ end, although the extent of thedeletions would naturally be limited by the length of the disclosedsequences and the need to be able to use the polynucleotides for thedetection of expression levels.

Other polynucleotides of the disclosure from the 3′ end of the abovedisclosed sequences include those of primers and optional probes forquantitative PCR. In some embodiments, the primers and probes are thosewhich amplify a region less than about 350, less than about 300, lessthan about 250, less than about 200, less than about 150, less thanabout 100, or less than about 50 nucleotides from the from thepolyadenylation signal or polyadenylation site of a gene or expressedsequence.

In yet other embodiments of the disclosure, polynucleotides containingportions of the above disclosed sequences including the 3′ end may beused. Such polynucleotides would contain at least or about 50, at leastor about 100, at least about or 150, at least or about 200, at least orabout 250, at least or about 300, at least or about 350, or at least orabout 400 consecutive nucleotides from the 3′ end of the disclosedsequences.

The disclosure also includes polynucleotides used to detect geneexpression in breast cells. The polynucleotides may comprise a shorterpolynucleotide consisting of sequences found in the above genes incombination with heterologous sequences not naturally found incombination with the sequences. Non-limiting examples include shortsequences from cloning vectors or present in restriction fragments usedto prepare labeled probes or primers as described herein.

HoxB13 and H/I

The methods of the disclosure based on the expression levels of HoxB13in breast cancer cells of a subject may be used as a predictor ofbenefit in switching endocrine therapy after an initial course ofendocrine therapy. In some embodiments, a two-gene ratio of HoxB13expression to IL17BR expression (or HoxB13:IL17BR ratio) may be used inthe manner reported by Ma et al. (J. Clin. Oncol., 24:4611-9 (2006). Inalternative embodiments, a two-gene ratio of HoxB13 expression to CHDHexpression may be used.

In cases using HoxB13 expression alone or the HoxB13:IL17BR (H:I) ratio,a cutoff value may be used to define breast cancer cells as havingeither a “high” and a “low” value corresponding to the expression. Insome embodiments, a cutoff may be used to define breast cancer cells ashaving either a “high H/I” and a “low H/I” value. As a non-limitingexample, the value of 0.06 may be used in the manner of Ma et al. Inother embodiments, the cutoff may be the average expression of HoxB13 inbreast cancer cells from afflicted subjects. In additional possibleembodiments, the cutoff may be the average value of H/I in breast cancercells from afflicted subjects as determined by the average HoxB13expression/the average IL17BR expression.

MGI

The genes disclosed below have roles in the cell cycle and reported peakexpression as follows:

Gene Peak of Expression Role in Cell Cycle BUB1B G2/M mitotic spindleassembly checkpoint CENPA G2/M centromere assembly NEK2 G2/M centrosomeduplication RACGAP1 Not Determined Initiation of cytokinesis RRM2 S DNAreplication

The sequences of these genes have been previously reported andcharacterized in the field. For example, and on Sep. 6, 2007, the humanBUB1B (also known as p21 protein-activated kinase 6 or PAK6) gene wasidentified by Unigene Hs.631699 and was characterized by 273corresponding sequences. On Mar. 6, 2010, the same gene information wasidentified by UniGene Hs.513645 and characterized as corresponding tochromosome 15 at position 15q14 and as supported by 23 mRNA sequencesand 549 EST sequences.

Also on Sep. 6, 2007, the human CENPA gene was identified by Hs.1594(with 129 corresponding sequences). On Mar. 6, 2010, the same geneinformation was characterized as corresponding to chromosome 2 at2p24-p21 and as supported by 10 mRNA sequences and 119 EST sequences.

Also on Sep. 6, 2007, the human NEK2 gene was identified by Hs.153704(with 221 corresponding sequences). On Mar. 6, 2010, the same geneinformation was characterized as corresponding to chromosome 1 at1q32.2-q41 and as supported by 17 mRNA sequences and 205 EST sequences.

Also on Sep. 6, 2007, the human RACGAP1 gene was identified by Hs.696319(with 349 corresponding sequences). On Mar. 6, 2010, the same geneinformation was identified by UniGene Hs.505469 and characterized ascorresponding to chromosome 12 at position 12q13.12 and as supported by15 mRNA sequences and 398 EST sequences.

Also on Sep. 6, 2007, the human RRM2 gene was identified by Hs.226390(with 1348 corresponding sequences). On Mar. 6, 2010, the same geneinformation was characterized as corresponding to chromosome 2 at2p25-p24 and as supported by 25 mRNA sequences and 1328 EST sequences.

The mRNA and EST sequences corresponding to each of the above Unigeneidentifiers are hereby incorporated by reference as if fully set forthand may be used in the practice of the disclosure by the skilled personas deemed appropriate. Representative mRNA sequences for each of BUB1B,CENPA, NEK2, RACGAP1, and RRM2 have been disclosed in U.S. patentapplication Ser. No. 12/718,973, published as US 2011-0136680 A1 on Jun.9, 2011. The disclosed sequences are non-limiting for the practice ofthe disclosed invention and are provided as evidence of the substantialknowledge in the field regarding sequences that are the disclosed genes.Additionally, the skilled person is fully capable of aligning any two ormore of the known expressed sequences for each of these genes toidentify an area of identity or conserved changes as a region thatuniquely identifies each of these genes in comparison to other genes.Furthermore, the skilled person is fully capable of aligning any two ormore of the known expressed sequences for each of these genes toidentify an area unique to one or more of the of the expressed sequencesas a region that uniquely identifies one known expressed sequencerelative to at least one other expressed sequence. As a non-limitingexample, a unique region may be in a variant of the expressed sequencefor one of the known genes such that the region may be used to identifyexpression of the variant.

The sequences of the same genes have also been identified andcharacterized from other animal species. Thus the skilled person in thefield is clearly aware of how to identify the disclosed genes relativeto other animal genes. The skilled person may also optionally comparethe known sequences of the disclosed genes from different animal sourcesto identify conserved regions and sequences unique to these genesrelative to other genes.

Methods

As described herein, the disclosure includes the identity of genes, theexpression of which can be used to provide prognostic informationrelated to cancer. In particular, the expression levels of these genesmay be used in relation to breast cancer. In some methods, the geneexpression profile correlates with (and so are able to discriminatebetween) patients expected to benefit from a switch in endocrine therapyfollowing an initial treatment with endocrine therapy for a period oftime. In other embodiments, the disclosure includes a method to comparegene expression in a sample of cancer cells from a patient to the geneexpression profile to determine the likely clinical or treatment outcomefor the patient, or natural biological result, in the absence of aswitch.

These embodiments of the disclosure may be advantageously used to meetan important unmet diagnostic need for the ability to predict whether apatient will likely benefit from a switch in treatment type. Forexample, a high H:I ratio value is strongly associated with response toa switch from first-line tamoxifen therapy for up to 5 years toletrozole therapy. The switch may occur anytime following the first-linetherapy, such as immediately afterward, within three months aftertermination of first-line therapy, within six months after terminationof first-line therapy, within nine months after termination offirst-line therapy, within 12 months after termination of first-linetherapy, within 18 months after termination of first-line therapy, orwithin 24 months (or more) after termination of first-line therapy.

So the disclosure includes a method to identify a patient, from apopulation of patients with ER+ breast cancer cells treated with a firstendocrine therapy and cancer-free for a period of time, as belonging toa subpopulation of patients with a better prognosis if treated with analternative endocrine therapy. In some cases, the breast cancer in thesubject is node negative. The disclosure provides a non-subjective meansfor the identification of patients in the subpopulation.

The disclosure also includes a method of determining prognosis and/orsurvival outcome by assaying for the expression patterns disclosedherein. So where subjective interpretation may have been previously usedto determine the prognosis and/or treatment of cancer patients, thisdisclosure provides objective gene expression patterns, which may usedalone or in combination with subjective criteria to provide a moreaccurate assessment of patient outcomes, including survival and therecurrence of cancer.

In some embodiments, the disclosure provides a method to determinetherapeutic treatment for a cancer patient by determining prognosis forsaid patient by assaying a sample of cancer cells from said patient forthe expression levels described herein, and selecting a treatment for apatient with such gene expression. The assaying may include measuring ordetecting or determining the expression level of the genes in anysuitable means described herein or known to the skilled person. In manycases, the cancer is breast cancer, and the subject is a human patient.Additionally, the cancer cells may be those of a tumor and/or from anode negative (lymph nodes negative for cancer) or node positive (lymphnodes positive for cancer) subject.

The requisite level of expression may be that which is identified by themethods described herein for the genes used. Additionally, the assayingmay include preparing RNA from the sample, optionally for use in PCR(polymerase chain reaction) or other analytical methodology as describedherein. The PCR methodology is optionally RT-PCR (reversetranscription-PCR) or quantitative PCR, such as real-time RT-PCR.Alternatively, the assaying may be conducted by use of an array, such asa microarray as known in the relevant field. Optionally, the sample ofcancer cells is dissected from tissue removed or obtained from saidsubject. As described herein, a variety of sample types may be used,including a formalin fixed paraffin embedded (FFPE) sample as anon-limiting example. And as described herein, the method may includeassaying or determining the H:I ratio (ratio of HoxB13 and IL17BRexpression levels) in the sample as disclosed herein.

By way of non-limiting example, all five genes of the MGI may be assayedand used to detect expression levels that correspond to a value that is“high risk” (which is above the cutoff) for MGI, or to detect expressionlevels that correspond to a value that is “low risk” (which is at orbelow the cutoff) for MGI, as disclosed herein. In some cases, the MGIcutoff threshold may be 0 (zero), such as where the measurements ofexpression levels are standardized to 0 (zero) with a standard deviationof 1. In alternative embodiments, the cutoff may be at or about 0.05, ator about 0.10, at or about 0.15, at or about 0.20, at or about 0.25, ator about −0.05, at or about −0.10, at or about −0.15, at or about −0.20,at or about −0.25, at or about −0.30, at or about −0.35, at or about−0.40, at or about −0.45, at or about −0.50, at or about −0.55, at orabout −0.60, at or about −0.65, at or about −0.70, at or about −0.75, ator about −0.80, at or about −0.85, at or about −0.90, at or about −0.95,at or about −1.0, at or about −1.1, at or about −1.2, at or about −1.3,at or about −1.4, at or about −1.5, at or about −1.6, at or about −1.7,at or about −1.8, at or about −1.9, at or about −2.0 or lower. Withrespect to the H:I ratio, its determination maybe made as described inMa et al., Cancer Cell, 5:607-16 (2004) and Ma et al. (2006) asreferenced herein. For example, a value of 0.06 may be used to determinewhether a sample has a “high risk” (>0.06) or “low risk” (≦0.06) H:Iratio.

So using a threshold, or cutoff, of 0 (zero) as a non-limiting examplefor MGI with all five genes, the disclosed methods provide two possibleassay outcomes for a given sample: “high risk MGI” corresponding to avalue above 0 (zero) and “low risk MGI” corresponding to a value≦0. A“high risk MGI” is indicative of a “high risk” cancer, including breastcancer that is analogous to that of a Grade III tumor as defined bymethodologies and standards known in the field. A “low risk MGI” isindicative of a “low risk” cancer, including breast cancer, that isanalogous to that of a Grade I tumor as defined by methodologies andstandards known in the field.

In one embodiment of the disclosure, a method is provided fordetermining the risk or likelihood of cancer recurrence in a subjectafter treatment for breast cancer, such as removal of the cancer bysurgery. The method may comprise i) preparing cDNA from nucleic acids ina sample of ER+ breast cancer cells removed from a breast cancerafflicted subject; ii) determining the expression levels of the sevengenes in the disclosed Breast Cancer Index (BCI) from said cDNA todetermine a BCI value; iii) identifying the subject as having beentreated with endocrine therapy for a period of time without cancerrecurrence; and iv) classifying the cancer as likely to recur due to ahigh risk BCI value. In some cases, the subject has been treated withendocrine therapy for about 5 years or more, about 4 years or more,about 3 years or more, about 2 years or more, or about 1 year or more.

In another embodiment of the disclosure, a method is provided fordetermining the likelihood of a beneficial switch in endocrine therapyas treatment for breast cancer. The method may comprise i) preparingcDNA from nucleic acids in a sample of ER+ breast cancer cells removedfrom a breast cancer afflicted subject; ii) determining the expressionlevel of the HoxB13 gene from said cDNA; iii) optionally identifying thesubject as having undergone surgical removal of the breast cancer; iv)identifying the subject as having been treated with a first endocrinetherapy for a period of time without cancer recurrence; and v)classifying the subject as expected to benefit from treatment with adifferent second endocrine therapy after cessation of the firstendocrine therapy, wherein said classifying is based upon an elevatedexpression level of HoxB13. In some cases, the elevated expression levelof HoxB13 is determined as part of the H/I value and the classifying isbased upon a high H/I value. In some cases, the subject has been treatedwith endocrine therapy for about 5 years or more, about 4 years or more,about 3 years or more, about 2 years or more, or about 1 year or more.

In additional embodiments, the disclosure provides a method to treat apatient that has undergone a first endocrine therapy as described above.The method may comprise the above determining the likelihood of abeneficial switch in endocrine therapy as treatment for breast cancerfollowed by treating the patient with a second endocrine therapy afterending treatment with the first endocrine therapy.

As non-limiting examples, the first endocrine therapy may be treatmentwith an SERM or an SERD and the second endocrine therapy may betreatment with an aromatase inhibitor. Alternatively, the firstendocrine therapy may be treatment with an aromatase inhibitor and thesecond endocrine therapy may be treatment with an SERM or an SERD.Embodiments include tamoxifen as the first endocrine therapy followed byletrozole as the second, or letrozole as the first endocrine therapyfollowed by taxmoxifen as the second.

The disclosure further includes a method of determining a prognosticfactor or predictor of clinical responsiveness in pre-menopausal womenand post-menopausal women. Post-menopausal women may be defined as thosethat are ≧50 years old while pre-menopausal women may be defined asthose who are less than 50 years old.

The ability to discriminate is conferred by the identification ofexpression of the individual genes as relevant and not by the form ofthe assay used to determine the actual level of expression. An assay mayutilize any identifying feature of an identified individual gene asdisclosed herein as long as the assay reflects, quantitatively orqualitatively, expression of the gene in the “transcriptome” (thetranscribed fraction of genes in a genome) or the “proteome” (thetranslated fraction of expressed genes in a genome). Identifyingfeatures include, but are not limited to, unique nucleic acid sequencesused to encode (DNA), or express (RNA), said gene or epitopes specificto, or activities of, a protein encoded by said gene. All that isrequired is the identity of the gene(s) necessary to discriminatebetween cancer outcomes and an appropriate cell containing sample foruse in an expression assay.

Similarly, the nature of the cell containing sample is not limiting, asfresh tissue, freshly frozen tissue, and fixed tissue, such asformalin-fixed paraffin-embedded (FFPE) tissues, may be used in thedisclosed methods.

In one embodiment, the disclosure provides for the identification of thegene expression patterns by analyzing global, or near global, geneexpression from single cells or homogenous cell populations which havebeen dissected away from, or otherwise isolated or purified from,contaminating cells beyond that possible by a simple biopsy. Because theexpression of numerous genes fluctuate between cells from differentpatients as well as between cells from the same patient sample, thelevels of gene expression may be determined in correspondence to one ormore “control” or “normalization” genes, the expression(s) of which arerelatively constant in the cells of a patient or between patients.

In another aspect, the disclosure includes physical and methodologicalmeans for detecting the expression of gene(s) identified by the modelsgenerated by individual expression patterns. These means may be directedto assaying one or more aspect of the DNA template(s) underlying theexpression of the gene(s), of the RNA used as an intermediate to expressthe gene(s), or of the proteinaceous product expressed by the gene(s).

One advantage provided by the disclosure is that contaminating,non-cancer cells (such as infiltrating lymphocytes or other immunesystem cells) are not present to possibly affect the genes identified orthe subsequent analysis of gene expression to identify the cancerrecurrence and/or survival outcomes of patients. Such contamination ispresent where a biopsy containing many cell types is used to assay geneexpression profiles.

While the present disclosure is described mainly in the context of humancancer, such as breast cancer, it may be practiced in the context ofcancer of any animal. Preferred animals for the application of thepresent disclosure are mammals, particularly those important toagricultural applications (such as, but not limited to, cattle, sheep,horses, and other “farm animals”), animal models of cancer, and animalsfor human companionship (such as, but not limited to, dogs and cats).

The methods provided by the disclosure may also be automated in whole orin part.

Kits

The materials for use in the methods of the present disclosure areideally suited for preparation of kits produced in accordance with wellknown procedures. The disclosure thus provides kits comprising agentsfor the detection of expression of the disclosed genes for gradingtumors or determining cancer outcomes. Such kits optionally comprise theagent with an identifying description or label or instructions relatingto their use in the methods of the present disclosure. Such a kit maycomprise containers, each with one or more of the various reagents(typically in concentrated form) utilized in the methods, including, forexample, pre-fabricated microarrays, buffers, the appropriate nucleotidetriphosphates (e.g., dATP, dCTP, dGTP and dTTP; or rATP, rCTP, rGTP andUTP), reverse transcriptase, DNA polymerase, RNA polymerase, and one ormore primer complexes of the present disclosure (e.g., appropriatelength poly(T) or random primers linked to a promoter reactive with theRNA polymerase). A set of instructions will also typically be included.

Having now generally provided the disclosure, the same will be morereadily understood through reference to the following examples which areprovided by way of illustration, and are not intended to be limiting ofthe disclosure, unless specified.

EXAMPLES Example I General Patients and Tumor Samples

Samples from the NCIC CTG MA.17 cohort (see Goss et al., J. Clin.Oncol., 26(12):1948-1955, 2008) were used. 100 cases with 200 controlswere used. The 100 cases included 61 cases of distant cancer recurrence;17 cases of local cancer recurrence; 5 cases of regional cancerrecurrence; 16 cases of contralateral recurrences; and 1 unknown case.Of these, the contralateral and unknown cases were excluded.

Clinical follow-up data were available for the samples used, which wereformalin-fixed paraffin-embedded (FFPE) tumor blocks from the time ofdiagnosis. Odds ratios were calculated with analysis of BCI, H:I, HoxB13and MGI as continuous and categorical variables. Multivariate analysisalso included age, tumor grade and treatment in the analysis. Treatmentinteraction: age and tumor grade were also included in the analysis.P-values were calculated for the interaction term.

Table 1 summarizes characteristics for the cases and controls (N=249).

TABLE 1 Patient and tumor characteristics Case-control MA-17 overallstudy Cases Controls Factor Description (n = 5157) (n = 249) (n = 83) (n= 166) P-value Age <50 9 (4%) 4 (5%) 5 (3%) 0.64 — >=50, <60 83 (33%) 27(32%) 56 (34%) >=60, <70 82 (33%) 24 (29%) 58 (35%) >=70 75 (30%) 28(34%) 47 (28%) Tumor Grade 1 26 (10%) 6 (7%) 20 (12%) 0.28 2 166 (67%) 54 (65%)  112 (67.5%) — 3 57 (23%) 23 (27%)   34 (20.5%) Tumor TypeDuctal 218 (88%)  71 (86%)  147 (88.6%) 0.63 — Lobular 31 (12%) 12 (14%)  19 (11.4%) N Stage N0 94 (38%) 31 (37%) 63 (38%) 0.45 N1 138 (55%)  44(53%) 94 (57%) N2, N3, NX 17 (7%)   8 (10%) 9 (5%) T Stage T1 110 (44%) 37 (45%) 73 (44%) 0.58 T2 111 (45%)  35 (42%) 76 (46%) T3 21 (8%)  7(8%) 14 (8%)  T4, TX 7 (3%) 4 (5%) 3 (2%) Prior Chemo Treatment No 148(59%)  49 (59%) 99 (60%) 0.96 Yes 101 (41%)  34 (41%) 67 (40%) PriorRadiation No 150 (60%)  49 (59%) 101 (61%)  0.89 Treatment Yes 99 (40%)34 (41%) 65 (39%) Treatment Arm Letrozole 122 (49%)  31 (37%) 91 (55%)0.01 Placebo 127 (51%)  52 (63%) 75 (45%)

Real-Time RT-PCR Assays for H/I and MGI

Primer and probe sequences for HOXB13 and IL17BR, as well as controlgenes ESR1, PGR, CHDH, ACTB, HMBS, SDHA and UBC, were used as describedpreviously (Ma et al., supra). Primer and probe sequences for the fivemolecular grade genes (BUB1B, CENPA, NEK2, RACGAP1 and RRM2) as well asERBB2 (HER2) were prepared using Primer Express (ABI).

Sections of each FFPE sample were used for RNA extraction. Grossmacro-dissection was used to enrich for tumor content. RNA extraction,reverse transcription, and TaqMan RT-PCR using the ABI 7900HT instrument(Applied Biosystem, Inc) were performed as described before (Ma et al.,id.). The cycling threshold numbers (CTs) were normalized to the mean CTof four reference genes (ACTB, HMBS, SDHA and UBC). The use of thesegenes is supported by the previous reports regarding these genes andrepresentative sequences of each of these genes known to the skilledperson. Normalized CTs were taken to represent relative gene expressionlevels.

Calculation of H/I and MGI

Generally, and with respect to MGI, it is preferred that the expressionlevels of the disclosed genes are combined to form a single index thatserves as a strong prognostic factor and predictor of clinicaloutcome(s). The index is a summation of the expression levels of thegenes used and uses coefficients determined from principle componentanalysis to combine cases of more than one disclosed gene into a singleindex. The coefficients are determined by factors such as the standarddeviation of each gene's expression levels across a representativedataset, and the expression value for each gene in each sample. Therepresentative dataset is quality controlled based upon the averageexpression values for reference gene(s) as disclosed herein.

Stated differently, and with respect to MGI, normalized expressionlevels for the five genes from microarrays or RT-PCR were standardizedto mean of 0 and standard deviation of 1 across samples within eachdataset and then combined into a single index per sample via principlecomponent analysis (PCA) using the first principle component.Standardization of the primary expression data within each dataset wasnecessary to account for the different platforms (microarrays andRT-PCR) and sample types (frozen and FFPE). As a result, and followingscaling parameters, a formula for the summation of expression valuesthat defines the index is generated. The precision of the scalingparameters can then be tested based on the means, standard errors, andstandard deviations (with confidence intervals) of the expression levelsof the genes across the data set. Therefore, generation of the formulafor the index is dependent upon the dataset, reference gene, and genesof the MGI.

The HOXB13:IL17BR ratio was calculated as the difference in standardizedexpression levels between HOXB13 and IL17BR as described previously (Maet al., id.). The means and standard deviations for HOXB13 and IL17BRused for standardizing the Table 1 cohort may be derived from ananalysis of 190 FFPE tissue sections from a separate population-basedcohort of estrogen receptor-positive lymph node-negative breast cancerpatients.

For MGI, obviously abnormal raw C_(T) values were removed prior toaveraging the values over duplicates for each gene and each sample. Theaveraged raw C_(T) value for each gene was then normalized by theaveraged C_(T) value of four reference genes (ACTB, HMBS, SDHA, andUBC). The normalized expression levels (ΔC_(T)) for the five genes werecombined into a single index per sample, which can be compared to apre-determined cutoff value, such as 0, where high MGI is above thecutoff and low MGI is below the cutoff.

Continuos BCI

A continuous risk model was built by combining H:I and MGI as continuousvariables. The linearity of these two variables were checked by fittinga Cox proportional hazard regression model with restricted cubicsplines, and H:I demonstrated significant non-linearity. A polynomialfunction of H:I was used to approximate the restricted models usingAkaike Information Criterion. The resulting predictor from the final Coxregression model was then re-scaled into the range of 0 to 10, which isreferred to as the BCI.

The BCI is further categorized into three levels: low risk, BCI<5;intermediate risk, 5≦BCI<6.4; high risk, BCI>6.4. These cut-offs werechosen such that the resulting proportions of low, intermediate, andhigh risk groups were similar to those formed by the three categoricalcombination groups of H:I and MGI.

Cut-Points and Statistical Analyes

H/I CUT-POINT: The cutpoint of 0.06 for the HOXB13:IL17BR ratio,previously defined to stratify patients treated with adjuvant tamoxifeninto low and high risk of recurrence, may be used in this study.

MGI CUT-POINT: The calculation and the cutpoint for MGI were definedwithout using any clinical outcome data and instead was a naturalcutpoint. Initial analysis of MGI in the Uppsala cohort indicated gooddiscrimination of grade 1 and grade 3 tumors using the mean (O) ascutpoint, and model-based clustering of MGI also indicated a bimodaldistribution with a natural cutpoint around 0. This cutpoint was furthersupported by receiver operating characteristic (ROC) analysis.

STATISTICAL ANALYSES: Kaplan-Meier analysis with logrank test and Coxproportional hazards regression were performed to assess the associationof gene expression indexes with clinical outcome. Multivariate Coxregression models were performed to assess the prognostic capacity ofgene expression indexes after adjusting for known prognostic factors.

Proportional hazards (PH) assumption was checked by scaled Schoenfeldresiduals; variables violating PH assumption were adjusted for in themodel through stratification. To account for the case-cohort design ofthe Table 1 cohort, we used weighted Kaplan-Meier analysis and Coxregression models with modifications to handle case-cohort designs (see^(19,20) as implemented in the survey package in R (www.r-project.org).To test for interaction between dichotomized MGI and the H:I ratio inCox regression models, the Wald statistic was used in the Table 1 cohortand likelihood ratio test was used in the last cohort.

Correlations of continuous variables with categorical factors wereexamined using non-parametric two-sample Wilcoxon test or Kruskal-Wallistest for factors with more than two levels.

All statistical analyses were performed in the R statisticalenvironment. All significance test were two-sided, and p<0.05 wasconsidered significant.

Example II Prognostic Performance

Table 2 shows the distribution of the cases and controls to thecontinuous BCI risk groups.

TABLE 2 BCI group, (%) Cases (n = 83) Controls (n = 166) Low 43.4% 57.8%Intermediate 22.9% 18.1% High 33.7% 24.1%

Table 3 shows the univariate analysis in relation to cancer recurrencein MA.17 subjects.

TABLE 3 Univariate analysis in relation to cancer recurrence Odds Ratio(95% CI) P-value Treatment (Placebo vs 2.02 (1.17-3.47) 0.01 Letrozole)Tumor Grade 0.28 II vs. I 1.73 (0.61-4.88) 0.30 III vs. I 2.53(0.81-7.90) 0.11 Analysis of BCI BCI 2.38 (1.21-4.69) 0.01 BCI, High vsLow 1.87 (1.00-3.50) 0.05 Analysis of components of BCI HoxB13 1.34(1.05-1.70) 0.02 HoxB13, High vs Low 2.17 (1.27-3.69) 0.004 H:I 2.52(1.08-5.85) 0.03 H:I, High vs Low 1.68 (1.00-2.81) 0.049 MGI 1.83(0.93-3.58) 0.08 MGI, High vs Low 1.49 (0.87-2.56) 0.15

Table 4 shows multivariate analysis in relation to cancer recurrence

TABLE 4 Multivariate analysis in relation to cancer recurrence OddsRatio (95% CI) P-value Analysis with BCI BCI 2.37 (1.08-5.22) 0.03 BCI,High vs Low 1.87 (0.88-3.95) 0.10 Analysis with components of BCI HoxB131.35 (1.05-1.74) 0.02 HoxB13, High vs Low 2.32 (1.32-4.10) 0.004 H:I2.55(1.03-6.32) 0.04 H:I, High vs Low 1.71 (0.98-2.97) 0.06 MGI 1.61(0.73-3.54) 0.24 MGI, High vs Low 1.37 (0.73-2.55) 0.33

As shown by the above, BCI is prognostic of late cancer recurrences inER+ patients following 5 years of tamoxifen treatment. HoxB13 expressionis also prognostic of late cancer recurrences in ER+ patients following5 years of tamoxifen treatment.

Example III Biomarker and Treatment Interaction

Table 5 shows interactions between the gene expression analyzed andtreatment.

TABLE 5 Biomarker and Treatment Interaction P-value HoxB13 0.047 H:I0.97 MGI 0.06 BCI 0.42 BCI High vs Intermediate + Low 0.08

Table 6 shows the distribution of HoxB13 gene expression in relation totreatments used. HoxB13 expression at diagnosis predicts patient benefitfrom extended endocrine therapy with letrozole after 5 years of adjuvanttamoxifen therapy.

TABLE 6 Letrozole Placebo Controls Cases Controls Cases P-value LowHoxB13 48 14 (23%) 46 16 (26%) 0.83 High HoxB13 43 17 (28%) 29 36 (55%)0.004

BIBLIOGRAPHY

-   1. Ma et al., Cancer Cell, 5:607-16 (2004)-   2. Ma et al., J. Clin. Oncol., 24:4611-9 (2006)-   3. Goetz et al., Clin. Cancer Res., 12:2080-7 (2006)-   4. Jerevall et al., Breast Cancer Res. Treat (2007)-   5. Jansen et al., J. Clin. Oncol. 25:662-8 (2007)-   6. Cianfrocca et al., Oncologist, 9:606-16 (2004)-   7. Sotiriou et al., J. Natl. Cancer Inst., 98:262-72 (2006)-   8. van 't Veer et al., Nature, 415:530-6 (2002)-   9. Paik et al., N. Engl. J. Med., 351:2817-26 (2004)-   10. Desmedt et al., Cell Cycle, 5:2198-202 (2006)-   11. Loi et al. J. Clin. Oncol., 25:1239-46 (2007)-   12. Sotiriou et al., Nat. Rev. Cancer, 7:545-53 (2007)-   13. Miller et al., Proc. Natl. Acad. Sci. USA, 102:13550-5 (2005)-   14. Pawitan et al., Breast Cancer Res. 7:R953-64 (2005)-   15. Rundle et al., Cancer Epidemiol Biomarkers Prev., 14:1899-907    (2005)-   16. Ma et al., Proc. Natl. Acad. Sci. USA, 100:5974-9 (2003)-   17. Whitfield et al., Mol. Biol. Cell, 13:1977-2000 (2002)-   18. Hirose et al., J. Biol. Chem., 276:5821-5828 (2001)-   19. Goldhirsch et al., Ann. Oncol., 16:1569-83 (2005)

All references cited herein, including patents, patent applications, andpublications, are hereby incorporated by reference in their entireties,whether previously specifically incorporated or not.

Having now fully described the inventive subject matter, it will beappreciated by those skilled in the art that the same can be performedwithin a wide range of equivalent parameters, concentrations, andconditions without departing from the spirit and scope of the disclosureand without undue experimentation.

While this disclosure has been described in connection with specificembodiments thereof, it will be understood that it is capable of furthermodifications. This application is intended to cover any variations,uses, or adaptations of the disclosure following, in general, theprinciples of the disclosure and including such departures from thepresent disclosure as come within known or customary practice within theart to which the disclosure pertains and as may be applied to theessential features hereinbefore set forth.

What is claimed is:
 1. A method comprising: preparing cDNA from nucleicacids in a sample of ER+ breast cancer cells from a breast cancerafflicted subject, determining the expression level of the HoxB13 genefrom said cDNA, identifying the subject as having undergone removal ofthe breast cancer and as having been treated with a first endocrinetherapy for a period of time without cancer recurrence, and classifyingthe subject as expected to benefit from treatment with the same firstendocrine therapy or a different second endocrine therapy aftercessation of the first endocrine therapy, wherein said classifying isbased upon an elevated expression level of HoxB13.
 2. A methodcomprising: preparing cDNA from nucleic acids in a sample of ER+ breastcancer cells from a breast cancer afflicted subject, determining theexpression levels of the seven genes in the disclosed Breast CancerIndex (BCI) from said cDNA, wherein said genes are HoxB13, IL17BR,Bub1B, CENPA, NEK2, RACGAP1, and RRM2, to determine a BCI value,identifying the subject as having undergone removal of the breast cancerand as having been treated with endocrine therapy for a period of timewithout cancer recurrence, and classifying the cancer as likely to recurdue to a high risk BCI value.
 3. The method of claim 1 wherein saidnucleic acids are mRNA from said sample.
 4. The method of claim 3wherein said RNA is used for PCR (polymerase chain reaction).
 5. Themethod of claim 1 wherein said determining comprises using an array. 6.The method of claim 1 wherein said sample is dissected from tissueremoved from said subject.
 7. The method of claim 4 wherein said PCR isRT-PCR (reverse transcription-PCR), optionally real time RT-PCR.
 8. Themethod of claim 1 wherein said sample is a formalin fixed paraffinembedded (FFPE) sample.
 9. The method of claim 1, further comprisingtreating said subject with letrozole after cessation of tamoxifentreatment.
 10. The method of claim 1 further comprising assaying for theH:I ratio in said sample and wherein said ratio is used to indicate anelevated expression level of HoxB13.
 11. The method of claim 1 whereinsaid cancer is ductal carcinoma in situ (DCIS) and said cancerrecurrence comprises local recurrence.
 12. The method of claim 1 whereinthe first endocrine therapy is treatment with tamoxifen and the secondendocrine therapy is treatment with letrozole.
 13. A method to treat acancer patient, said method comprising preparing cDNA from nucleic acidsin a sample of ER+ breast cancer cells from a breast cancer afflictedsubject, determining the expression level of the HoxB13 gene from saidcDNA, identifying the subject as having undergone removal of the breastcancer and as having been treated with a first endocrine therapy for aperiod of time without cancer recurrence, classifying the subject asexpected to benefit from treatment with the same first endocrine therapyor a different second endocrine therapy after cessation of the firstendocrine therapy, wherein said classifying is based upon an elevatedexpression level of HoxB13, and treating the patient with the secondendocrine therapy after ending treatment with the first endocrinetherapy.
 14. The method of claim 13, further comprising assaying for theH:I ratio in said sample, wherein said ratio is used to indicate anelevated expression level of HoxB13.
 15. The method of claim 13 whereinthe first endocrine therapy is treatment with tamoxifen and the secondendocrine therapy is treatment with letrozole.